The Use of Apprenticeship Learning Via Inverse Reinforcement Learning for Generating Melodies
نویسندگان
چکیده
The research presented in this paper uses apprenticeship learning via inverse reinforcement learning to ascertain a reward function in a musical context. The learning agent then used this reward function to generate new melodies using reinforcement learning. Reinforcement learning is a type of unsupervised machine learning where rewards are used to guide an agent’s learning. These rewards are usually manually specified. However, in the musical setting it is difficult to manually do so. Apprenticeship learning via inverse reinforcement learning can be used in these difficult cases to ascertain a reward function. In order to ascertain a reward function, the learning agent needs examples of expert behaviour. Melodies generated by the authors were used as expert behaviour in this research from which the learning agent discovered a reward function and subsequently used this reward function to generate new melodies. This paper is presented as a proof of concept; the results show that this approach can be used to generate new melodies although further work needs to be undertaken in order to build upon the rudimentary learning agent presented here.
منابع مشابه
Bayesian Inverse Reinforcement Learning
Inverse Reinforcement Learning (IRL) is the problem of learning the reward function underlying a Markov Decision Process given the dynamics of the system and the behaviour of an expert. IRL is motivated by situations where knowledge of the rewards is a goal by itself (as in preference elicitation) and by the task of apprenticeship learning (learning policies from an expert). In this paper we sh...
متن کاملApprenticeship learning with few examples
We consider the problem of imitation learning when the examples, provided by an expert human, are scarce. Apprenticeship Learning via Inverse Reinforcement Learning provides an efficient tool for generalizing the examples, based on the assumption that the expert’s policy maximizes a value function, which is a linear combination of state and action features. Most apprenticeship learning algorith...
متن کاملApprenticeship Learning About Multiple Intentions
In this paper, we apply tools from inverse reinforcement learning (IRL) to the problem of learning from (unlabeled) demonstration trajectories of behavior generated by varying “intentions” or objectives. We derive an EM approach that clusters observed trajectories by inferring the objectives for each cluster using any of several possible IRL methods, and then uses the constructed clusters to qu...
متن کاملMerge or Not? Learning to Group Faces via Imitation Learning
Given a large number of unlabeled face images, face grouping aims at clustering the images into individual identities present in the data. This task remains a challenging problem despite the remarkable capability of deep learning approaches in learning face representation. In particular, grouping results can still be egregious given profile faces and a large number of uninteresting faces and no...
متن کاملBatch, Off-Policy and Model-Free Apprenticeship Learning
This paper addresses the problem of apprenticeship learning, that is learning control policies from demonstration by an expert. An efficient framework for it is inverse reinforcement learning (IRL). Based on the assumption that the expert maximizes a utility function, IRL aims at learning the underlying reward from example trajectories. Many IRL algorithms assume that the reward function is lin...
متن کامل